Reconstruction-based Outlier Detection Data Mining and Data Warehousing - Java, Java Swing, OOAD, MIS, DSA

Data Mining And Data Warehousing

Reconstruction-based Outlier Detection

Reconstruction-based outlier detection

The reconstruction-based outlier detection methods identify outliers by measuring how well a data point can be reconstructed from a compressed or transformed representation of the original data. The core idea is that the normal points in original data set can be reconstructed with low error but outliers reconstructed will have high reconstruction error. Principal Components Analysis (PCA) is one of the reconstruction-based outlier dection methods.

For example: Let's image we have a dataset with two features, f₁ and f₂. f₁ and f₂ are highly correlated. If these data points are plotted in 2D space, most of the points lie along a diagonal line.

The Data points: (2,2), (3,4), (4,6), (5,8), (6,10) these are normal data. Consider another data point (11,3) this is outlier. We can see the normal points follow linear relationship as f₂ nearly equal to 2 * f₁ - 2. But outlier (11,3) does not fit this pattern.

Step-by-Step

First take the original data points
Then construct the compressed or transformed representation of the original data points
Then recontruct the original points from the compressed or transformed representation
Calculate error for all the original data points by comparing with their corresponding reconstructed data points
Normal data points will have low errors while outliers will have high errors.

Online-Academy

Look, Read, Understand, Apply

Data Mining And Data Warehousing